Skip to content

[ENH] V1 → V2 API Migration - Tasks#1611

Open
satvshr wants to merge 411 commits into
openml:mainfrom
satvshr:tasks
Open

[ENH] V1 → V2 API Migration - Tasks#1611
satvshr wants to merge 411 commits into
openml:mainfrom
satvshr:tasks

Conversation

@satvshr

@satvshr satvshr commented Jan 9, 2026

Copy link
Copy Markdown
Contributor

Metadata

@codecov-commenter

codecov-commenter commented Jan 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 16.35220% with 133 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.63%. Comparing base (fabbc28) to head (54f7ede).

Files with missing lines Patch % Lines
openml/_api/resources/task.py 12.69% 110 Missing ⚠️
openml/tasks/task.py 18.75% 13 Missing ⚠️
openml/tasks/functions.py 9.09% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1611       +/-   ##
===========================================
- Coverage   81.50%   27.63%   -53.88%     
===========================================
  Files          63       63               
  Lines        5240     5262       +22     
===========================================
- Hits         4271     1454     -2817     
- Misses        969     3808     +2839     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 9, 2026
18 tasks
@satvshr satvshr marked this pull request as ready for review January 12, 2026 15:29

@geetu040 geetu040 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high-level review, I noticed a few points that need adjustment:

  • Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
  • I don't see the api_context being used in tasks/functions, so it's not clear to me how the SDK is actually using the new API interface here.
  • Instead of moving entire methods out of tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support.
  • API calls should be updated at the specific root functions (for example _get_task_description, OpenMLTask._download_split).
  • For listing tasks, please follow the approach discussed in #1575 comment.

Comment thread examples/Advanced/fetch_evaluations_tutorial.py
@satvshr satvshr marked this pull request as draft January 14, 2026 20:25
@satvshr satvshr changed the title [ENH] Tasks Migration [ENH] V1 → V2 API Migration - Tasks Jan 15, 2026

@geetu040 geetu040 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.

Comment thread openml/_api/resources/base.py Outdated
Comment thread openml/_api/resources/base.py Outdated
Comment thread openml/_api/resources/tasks.py Outdated
Comment thread openml/_api/resources/tasks.py Outdated
Comment thread openml/_api/resources/tasks.py Outdated
Comment thread openml/_api/resources/tasks.py Outdated
Comment thread openml/_api/resources/tasks.py Outdated
Comment thread openml/_api/resources/tasks.py Outdated
Comment thread openml/_api/resources/tasks.py Outdated
Comment thread openml/tasks/functions.py
@satvshr

satvshr commented Jan 16, 2026

Copy link
Copy Markdown
Contributor Author

@geetu040 I will explain the confusion I am facing to the best of my abilities, and I do feel communicating on the PR threads is yielding no results, hence I will just put everything here:

  1. Over here you say the base class should have v1 functions like list, delete, and create. This means to me that all V1 related functions should be moved from functions.py to TasksV1 (from SDK to resources) and calls inside functions.py should be replaced with calls to api_context.backend.task.method. Over here, you say something similar, stating that TasksV1, V2, and TasksAPI should have the same function signatures.
  2. Over here and here you mention that we should try to keep most of the code in SDK and not resources (contradictory to point 1).
    Completely contrary to point 1 you mention get_tasks along with create_task and delete_task should stay in sdk over here.

I seem to be getting 2 contradictory messages from your end which is where the confusion arises.

@geetu040

Copy link
Copy Markdown
Collaborator

I seem to be getting 2 contradictory messages from your end which is where the confusion arises.

I don't feel confused with the picture I have in my mind, I think I need to explain it even better. I'll try to document each method in detail what should be moved and what should stay.

@satvshr

satvshr commented Jan 19, 2026

Copy link
Copy Markdown
Contributor Author

Kindly resolve the comments which have been solved, it would be easier to navigate!

@geetu040

geetu040 commented Jan 19, 2026

Copy link
Copy Markdown
Collaborator

Kindly resolve the comments which have been solved, it would be easier to navigate!

It doesn't look like I have the typical "Resolve" button.

Edit: It's visible in the "Files changed" section, I'll resolve them there.

@geetu040 geetu040 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, the code is in a better shape now. Many lines have been removed in the functions.py which seem fine but I'll take a deeper look. In other comments mostly some cleaning is required.

Comment thread openml/tasks/task.py Outdated
Comment thread openml/tasks/task.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an api call in _download_split, what do you think about that? also does it relate with the datasets PR?

@satvshr satvshr Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an api call in _download_split, what do you think about that?

I remember you and Shrivaths discussing cache paths for V1 and V2 and I think the understanding was that the paths would differ, if that is the case, we should change this later on and leave it as it is right now.

also does it relate with the datasets PR?

Only the get_dataset part, hence the Shrivaths comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks really good and clean now, just one open thread which needs discussion now: #1611 (comment)

@satvshr satvshr Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agreed with my approach, can we not close that too?

Comment thread openml/_api/resources/base.py Outdated

@geetu040 geetu040 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdk code look good so far, please take a look at #1575 (comment) and make changes accordingly where needed.
all tests (existing and new) should pass to make sure we are retaining the original functionality of the sdk

Comment thread openml/_api/resources/tasks.py Outdated
task_type: TaskType | int | None = None,
**kwargs: Any,
) -> pd.DataFrame:
raise NotImplementedError("Task listing is not available in API v2 yet.")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see point 8 in #1575 (comment)

Comment thread openml/tasks/functions.py
return tasks


@openml.utils.thread_safe_if_oslo_installed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see point 3 in #1575 (comment)

Comment thread openml/tasks/task.py Outdated


class OpenMLTask(OpenMLBase):
class OpenMLTask(OpenMLBase, ResourceAPI):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not inherit ResourceAPI

Comment thread openml/tasks/task.py Outdated
source=str(split_url),
output_path=str(cache_file),
)
self._http.download(url=str(split_url), file_name="datasplits.arff")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use api_context.backend.task.[something] here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How am I supposed to do that? download is a method in the HTTPClient.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this should be under a method now called openml._backend.task.[something] and then we can take this problem to TaskV1.[something].

I see this function depends on download of HTTPClient, I'll either add that method in the core PR or we can plan to merge datasets PR before this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be moving this to TaskAPI after discussion.

Comment thread openml/tasks/functions.py Outdated
Comment on lines +592 to +245
raise NotImplementedError(
f"Task type ID {task_type:d} is not supported. "
f"Supported task type IDs: {TaskType.SUPERVISED_CLASSIFICATION.value},"
f"{TaskType.SUPERVISED_REGRESSION.value}, "
f"{TaskType.CLUSTERING.value}, {TaskType.LEARNING_CURVE.value}. "
f"Please refer to the TaskType enum for valid task type identifiers."
)
raise NotImplementedError(f"Task type {task_type:d} not supported.")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep it as it was

@satvshr satvshr marked this pull request as ready for review February 4, 2026 15:57
Comment thread openml/_api/resources/task.py Outdated

return pd.DataFrame.from_dict(tasks, orient="index")

def _get_estimation_procedure_list(self) -> builtins.list[dict[str, Any]]:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geetu040 We already have implemented this function in the estimation_procedure resource, so I think we don't need it to be here, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, once we merge your PR we can replace it with the _get_details function yes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's the plan

Comment thread openml/_api/resources/task.py Outdated
task_type: TaskType | int | None = None,
**kwargs: Any,
) -> pd.DataFrame:
raise NotImplementedError(self._not_supported(method="list"))

@EmanAbdelhaleem EmanAbdelhaleem Feb 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use self._not_supported(method="list")

no need for raise NotImplementedError()

@geetu040 geetu040 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the old files in tests/files so git identifies and marks them as renamed files which would make it easier to track

a lot of irrelevant files are touched in tests, probably from your other pr's pre-commit, please revert them

see the comments on _get_estimation_procedure_list
don't remove this function, keep it as it is, it's updated here #1604 (changes)
call it separately in list_tasks and pass onwards to openml._backend.task.list as a parameter
it should stay out of _api/
keep test__get_estimation_procedure_list

remove TaskV1API and TaskV2API import in openml/tasks/functions.py

Comment thread openml/_api/resources/base/resources.py Outdated
Comment thread openml/_api/resources/task.py Outdated
Comment thread openml/_api/resources/task.py Outdated

return pd.DataFrame.from_dict(tasks, orient="index")

def _get_estimation_procedure_list(self) -> builtins.list[dict[str, Any]]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should stay out of _api/

Comment thread openml/tasks/functions.py
Comment thread openml/tasks/functions.py
raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e


def _get_estimation_procedure_list() -> list[dict[str, Any]]:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't remove this function, keep it as it is, it's updated here #1604 (changes)
call it separately in list_tasks and pass onwards to openml._backend.task.list as a parameter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will then have to alter the signature of openml._backend.task.list to accept _get_estimation_procedure_list as a permanent argument, which doesnt make sense?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would make sense, why not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is being called in multiple places so I assume it is something that will be used a lot, won't this change just prevent positional arguments in the future?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussion moved over to discord

Comment thread tests/test_tasks/test_task_functions.py Outdated
Comment thread tests/test_tasks/test_task_functions.py
Comment thread tests/test_tasks/test_task_methods.py
Comment thread tests/test_tasks/test_task_methods.py Outdated
Comment thread tests/conftest.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would fail if you delete the old files in tests/files, you can keep it like the suggestion in #1619 (review)

@geetu040 geetu040 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated #1619 (review). Please sync with main and update cache files accordingly.

Comment thread openml/tasks/task.py Outdated
Comment thread openml/tasks/task.py Outdated
Comment thread openml/tasks/task.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is cache and api-calls which are needed to be updated in this file. I think you'd only need to update OpenMLTask.download_split. You should use _http.download here. Let me know if you need help.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed on discord, no need to use _http.download, instead use _http.get where applicable.

@geetu040 geetu040 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@satvshr what's the status of this PR? It has been stale for quite a while now. Are you still working on it, planning to continue it, or waiting for a review?

@satvshr

satvshr commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

@geetu040 you had pushed some commits so I was awaiting instructions on how to proceed next, unless you were picking it up, was my thought.

@geetu040

geetu040 commented May 1, 2026

Copy link
Copy Markdown
Collaborator

@geetu040 you had pushed some commits so I was awaiting instructions on how to proceed next, unless you were picking it up, was my thought.

I think we discussed but I've made the changes in both datasets and this pr. You can continue if you are willing to work on it. That would require first to fix the tests in the CI by updating tests suits with the moved cache files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] V1 → V2 API Migration - tasks

7 participants